For this workshop, we will be using R via RStudio.
You can think of R like a car’s engine, while RStudio is like a car’s dashboard.
So what this means is that, just as we don’t drive a car by interacting directly with the engine but rather by interacting with the car’s dashboard, we won’t be using R directly.
Instead, we will be using the RStudio’s interface.
After you open RStudio, you should see the following 3 panels:
R packages extend the functionality of R by providing additional functions, data and documentation.
So let’s continue with this analogy: Let’s say you’ve purchased a new phone (brand new R/RStudio install) and you want to take a photo (do some data analysis) and share it with your friends and family. So you need to:
This process is very similar when you are using an R package. You need to:
install.packages("tidyverse")
library(tidyverse)
See ModernDive Chapter 1 for further reading.
One day you will need to quit R, go do something else and return to your analysis later.
One day you will be running multiple analyses in R and you want to keep them separate.
One day you will need to bring data from the outside world into R and present results and figures from R back out to the world.
So how do you know which parts of your analysis is “real” and where does your analysis “live”?
Working directory is where R will look, by default, for files you ask it to load or to save.
You can explicitly check your working directory with:
getwd()
[1] "/home/travis/Documents/Repositories/introrworkshop/scripts"
It is also displayed at the top of the RStudio console
DO NOT USE setwd unless you want Jenny Bryan to set your computer on fire!
So what’s wrong with:
setwd("/Users/amy/fuzzy_alpaca/cute_animals/foofy/data")
df <- read.delim("raw_foofy_data.csv")
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave("../figs/foofy_scatterplot.png")
The chance of the setwd() command having the desiered effect - making the file paths work - for anyone besides its author is 0%. It might not even work for the author a year or two from now. So essentially your data analysis project is not self-contained and protable, which makes recreating the plot impossible.
Read more here: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
Typically, I organize each data analysis into a project using RStudio Project. I tend to have a directory each for: